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Abstract. The immune response to a pathogen has two basic features. The first 
is the expansion of a few pathogen-specific cells to form a population large enough 
to control the pathogen. The second is the process of differentiation of cells from 
an initial naive phenotype to an effector phenotype which controls the pathogen, and 
subsequently to a memory phenotype that is maintained and responsible for long- 
term protection. The expansion and the differentiation have been considered largely 
independently. Changes in cell populations are typically described using ecologically 
based ordinary differential equation models. In contrast, differentiation of single 
cells is studied within systems biology and is frequently modeled by considering 
changes in gene and protein expression in individual cells. Recent advances in 
experimental systems biology make available for the first time data to allow the 
coupling of population and high dimensional expression data of immune cells during 
infections. Here we describe and develop population- expression models which integrate 
these two processes into systems biology on the multicellular level. When translated 
into mathematical equations, these models result in non-conservative, non-local 
advection-diffusion equations. We describe situations where the population-expression 
approach can make correct inference from data while previous modeling approaches 
based on common simplifying assumptions would fail. We also explore how model 
reduction techniques can be used to build population-expression models, minimizing 
the complexity of the model while keeping the essential features of the system. While 
we consider problems in immunology in this paper, we expect population-expression 
models to be more broadly applicable. 
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1. Introduction 

The central feature of the adaptive immune system is the ability to respond to 
a broad range of pathogens, including emerging threats never before encountered, 
without mounting responses to the native tissues of the body [1]. This dynamic 
is explained by the clonal selection theory, which underlies our understanding of 
immunology. This theory postulates that we begin with a very diverse population of 
immune cells (lymphocytes), with each lymphocyte having a unique and fixed specificity. 
Consequently the number of lymphocytes specific for a given pathogen is very small. 
Following infection these pathogen-specific lyphocytes undergo rapid division (clonal 
expansion) and differentiation into effector cells, which are able to control the pathogen. 
Following clearance of the pathogen some of these lymphocytes differentiate into memory 
cells, which are maintained for extended periods and account for long-term protection. 
The clonal selection theory describes the generation of the T cell and B cell responses. 
In Fig. 1 we show a schematic of clonal selection for a typical CD8 T cell response to a 
viral infection. 
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Figure 1. Schematic illustration of a typical CD8 T cell response to an infection. 
The plot shows the enormous changes in numbers of pathogen-specific CD8 T cells 
during the course of infection, as well as changes in cell phenotype. The response 
has three phases which correspond to population expansion, contraction and stability. 
Differentiation results in changes in the phenotype of cells from naive to effector 
and memory. Typically this type of a response is described by ordinary differential 
equations that govern changes in populations of cells having naive, effector and memory 
phenotypes. 

The enormous changes in population sizes suggested that, as in ecology, 
ordinary differential equation (ODE) models of the populations would prove useful to 
understanding the immune response [2, 3, 4]. In these models cells are restricted to a few 
distinct phenotypes with division, death, and transition rates between the phenotypes 
to describe the dynamics. The models typically ignore how the systems biology on the 
cellular scale governs the rate laws in the models on the population scale. While such 
models have proven useful in addressing a number of population level questions, they 
have their limitations. For the approach to work well, phenotypic states must be well 
resolved and the transitions between them must be rapid. 
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Fig. 2 presents data capturing the dynamics of T cells obtained via flow cytometry. 
This figure shows the density of CD8 T cells following a yellow fever vaccination plotted 
as a function of two surface expressed molecules (CD45RA, a signaling molecule that 
regulates antigen receptor signaling, and CCR7, a molecule which aids in trafficking 
of T cells to lymph-nodes) [ ]. The population gradually transitions from CD45RA 
low to high during the contraction and memory phases. This figure illustrates one 
problem with ODE models of multicellular population dynamics: How does one 
unambiguously partition data into distinct phenotypes when there is considerable 
heterogeneity or gradual transitions? This ambiguity gives rise to subjectivity and 
quantitative disagreement between labs in the analysis of immunological data [6]. 
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Figure 2. The differentiation of human CD8 T cells following yellow fever vaccination. 
These flow cytometry plots show the population of antigen specific CD8 T cells (red) 
responding to the vaccination differentiating from CD45RA negative to positive while 
expanding and then contracting in number. This transition is associated with the 
transition from effector memory to central memory. Reproduced with permission from 



The flow of populations as they differentiate (Fig. 2) is governed largely by the 
systems biology of the cells [7, 8, 9]. (While the term systems biology has been 
used very broadly, in this manuscript we adopt the most common usage, referring 
to models of chemical reaction networks typically within single cells or homogeneous 
cell cultures [10].) Typical systems biology models consist of ODEs or stochastic 
differential equations that model reaction rates between chemical species, providing 
a finer resolution of phenotypic states. 

While population models loose accuracy in not considering the chemical scale, 
systems biology models have contrasting limitations resulting from omission of the 
population dynamics. Typically, the analysis and parameter estimation of differentiating 
populations has been performed on time scales where division is negligible [11]. On 
longer time scales, population dynamics and systems biology are coupled and must be 
considered together. The expression levels of gene products control cell division and 
death rates. In their turn, cell division and death rates change the number of cells 
in various phenotypic states and hence shape the expression profiles of populations. 
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Additionally the process of cell division dilutes expression and can generate spurious 
correlations between expressed chemicals. Clearly, modeling immune system dynamics 
requires an integrated approach, combining population dynamics and systems biology. 

One way to do this is to conceptualize each flow cytometry data set as samples from 
a density in a multidimensional cellular configurational space, where each dimension 
denotes the quantity of a specific chemical. Individual cells would trace out trajectories 
in this configurational space as they differentiate. Unfortunately in vivo single cell 
longitudinal data is difficult to obtain, and for dividing cells the term longitudinal 
is undefined. Thus instead of tracking cells over time, one can focus on tracking 
populations, or distributions of cells in the configurational space. This can be done using 
partial differential equations (PDEs) and related mathematical concepts, an approach 
gaining popularity in theoretical immunology [12, 13, 14]. 

Such population- expression models circumvent our inability to define clean cellular 
phenotypic states. They remove the inherent subjectivity in phenotype discrimination 
[6], and they remove the need to incorporate additional phenotypes to better fit models 
to data. They integrate the within cell stochastic chemical kinetics into models of 
the population dynamics. Ultimately, they allow analysis of the diversity of protein 
expression within populations, how it changes with time, and how the diversity is 
affected by selection. 

The main goal of this paper is to introduce such population-expression modeling, 
explain utility of the approach in the context of toy models, and discuss the 
methodological developments needed for practical applications of the ideas. To achieve 
this, we first introduce the general formalism of population-expression models. Following 
this we provide a number of examples of population-expression models, illustrating 
where ecological based ODE models succeed and fail, how cell division dilutes chemical 
quantities, where single-cell analyses fail to describe the population, and how we can 
infer from data which chemicals may be drivers in regulatory networks. We end with a 
critical look at some of the key problems arising when we confront population-expression 
models with ever increasing dimensionality of experimental datasets. 



2. Population-expression approach: The formulation 

Instead of predefining a limited number of cell phenotypes, our population-expression 
approach takes the abundance of cells with different chemical states as the dynamical 
variable. We denote by p(A, t) the density of cells at time t, with internal biochemical 
expressions (internal states) of A. 

To describe how p(A, t) changes with time, we first consider how a single cell moves 
in the configurational space of A values. Denoting the set of differential equations that 
describe the changing chemical quantities within a single cell by 

f = *t), _ (i) 

the abundance p flows according to the vector field denoted by ^{A). Standard 
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techniques [15] generate an advection equation that describes how the density changes 
according to the vector field ^(A): 

d ^ = -V-[y { AMAt ) ]. (2) 

The quantity in the square brackets denotes the total flux of cells changing in expression 
level as they move through the configurational space. 

Incorporating population dynamics into these equations can be done with additional 
terms for cell death and sources of new cells: 
dp(A,t) 

at 

Here v{A) denotes a cellular death rate that is a function of the chemical concentration 
and T(A) is an influx of new cells entering the system in a chemical state A. 

Cell division can be included by adding nonlocal terms to Eq. (3). For example, if 
in a symmetric cell division, all chemicals in the cell are split equally between the two 
daughters, we have 



-V 



l{A)p{A,t) -v{A)p{A,t)+T{A). (3) 



d P (A,t) = _ v 



i(A)p(A, t) - n(A)p(A, t) + 2 d - 2p(2A)p(2A, t) 



dt 

-is(A)p(A,t)+T(A). (4) 

Here p(A) is the division rate, which we assume depends on the cell age and other 
properties only implicitly through the instantaneous state of the cell, A. In this equation, 
dividing cells are removed from the abundance at A, and two daughters are added to 
the abundance at A from each dividing cell at abundance 2A. The factor of 2 d in the 
last term arises from a subtlety of the non-local calculus, where we are adding to the 
region (A, A + 5) from the region (2A, 2A + 25) which is twice the width in each of the d 
dimensions of A. This equation can also be modified to describe dilution in asymmetric 
cell division. 

Finally we can very naturally incorporate the stochastic fluctuations resulting from 
the chemical dynamics [16, 17]. This is typically done by constructing a chemical master 
equation, and expanding in small relative fluctuations [18]. Expanding to lowest order 
gives Eq. (2). Expanding to next highest order results in a nonlocal analogue of the 
Fokker-Planck equation, which spreads the population in the A space due to stochasticity 
of the intrinsic chemical processes: 

dp(A,t) 
dt 

- u(l)p(A, t) + T(A) + V(D(A)Vp(A, t)). (5) 

Thus the advection dynamics becomes the advection-diffusion dynamics. Such 
approaches to modeling fluctuations in single cells are now commonplace in molecular 
systems biology [19], and many efficient simulation and analysis algorithms have been 
developed [20]. 

As in systems biology, in population-expression models some state variables may 
remain discrete. For example the state of transcription factor binding may be best 



- V 



l{A)p{A, t) - p.(A)p{A, t) + 2 d - 2p(2A)p(2A J t) 
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described by a binary on/off variable, or compartmental spacial dependence could be 
incorporated into the model (e.g. lung, spleen, etc.). In these cases we typically describe 
multiple coupled densities Pi(A, t), with population-expression dynamics, Eq. (5), for 
each density and with terms that couple the equations through transitions between 
the states, such as YlikjPj{A,t). Of possibly high relevance to the current work, 
cross-sectional flow cytometry samples from cellular populations at different time 
points have been used to infer parameters of chemical reaction rates 7 [11]. The 
population-expression approach differs from these analyses by incorporating the effects 
of proliferation, cell death, and dilution by cell division. We show below that these 
effects can substantially bias the resulting expression profile of a population. 

Note that, for much of this paper, we assume that p(A,t) can be measured: that 
the number of samples is large enough so that inference of p is not a hard task. This 
breaks down if d = dim A ^> 1. We will discuss this case in Sec. 4. Similarly, we assume 
that population-expression equations are sufficiently low- dimensional to be numerically 
solvable. When this is not the case, Monte-Carlo simulations might be needed, and we 
briefly touch on this topic in the Discussion. 

3. Population-expression approach: Examples 

In this section, we use the population-expression approach to model simple processes 
of relevance to different aspects of immune dynamics. The examples illustrate the 
inadequacy of single cell systems biology (expression) and ecological based ODE 
(population) modeling approaches. 

3.1. Ecological based ODE model failure: slow expression dynamics 

Ecological based ordinary differential equation models of phenotypical population 
dynamics work well only when phenotypes are sharply defined and transitions between 
them are rapid. This is not always the case. Consider, for example, a transition between 
phenotypes that occurs when an internal state has changed, but the observables take 
time to reach their characteristic values for this new state. For example, a good measure 
of the phenotypic state of a cell may be the binding of transcription factors (TFs) to 
DNA, which is possible but not easy to measure [21]. On the other hand, we routinely 
measure expression levels of protein using flow cytometry. These levels are typically 
controlled by transcription factor binding, but changes in protein expression lag behind 
changes in TF binding. Thus the dynamics of switching observed in flow cytometry 
data may be non-trivial. 

Here we model cells having a discrete state denoting transcription factor binding 
("off" or "on"), and a continuous variable A for expression level. Off cells can switch 
to the on state with the rate k, and the dynamics of A is given by dA/dt = 7ofr/on, 
where 7 ff/ n depends on the state. Namely, the chemical A has two possible production 
rates: a Q s, and a on . In both states there is the same degradation rate /3. This kinetics 
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may correspond, for example, to the expression and decay of mRNA or protein if mRNA 
levels equilibrate quickly in comparison to the protein dynamics. We consider the cells in 
the two states separately: p g (A, t) is the number of cells in the off state with expression 
level A, and p on (A,t) are the cells in the on state. The population-expression equations 
are: 

dpos(A,t) d 

- Tr-r[{a s - /3A)p oS (A,t)\ + 



dt OA 
1 d 2 



2dA< 



[{a oS + f3A)p oS (A, t)} - k PoS (A, t), (6) 



-^[(a oa -f3A)p on (At)} + 

1 d 2 

2 OA 2 [(CW + /3A)/?on ^' *)] + k Pos(A t). (7) 
Similar models for single cells in equilibrium [22, 23], and even off-equilibrium for simpler 
cases [24], have been solved exactly. Here we analyze this system numerically in the 
non-equilibrium context. We solve these equations with a method of lines integration 
with a finite differencing approximation for A derivatives, and Matlab ODE45 routine 
for integrating forward in time. 

Fig. 3 plots numerical solutions of p(A,t) = p g(A,t) + p on (A,t), defined by 
Eqs. (6, 7), for two contrasting pictures of differentiation. The left panels shows 
infrequent TF switching with rapid protein expression (k <C 13(A)). In this case the 
protein concentration in each cell tracks its transcriptional state well, phenotypes are 
well defined, and an ODE model describing switching between them works well. The 
right panels in Fig. 3 represent the case when TF switching is rapid, but change in 
protein expression is gradual. The initial and final states are identical to the scenario 
on the left. The gradual protein expression gives a large density of cells with intermediate 
protein expression on day 15, and no well-resolved phenotypes. 

Dashed lines in Fig. 3 define low and high expressing phenotypes, as is typical in the 
analysis of flow cytometry data. The number of cells in the low expressing phenotype is 
shown as a function of time in Fig. 4 for both scenarios. For rare switching, modeling 
the system with two phenotypes with population sizes X\ and X 2 , respectively, as 
dX l 



dt 

dX 2 



-kXx, 

dt +kX - < 8 > 
produces great fits to the data. In contrast, two-state modeling for the slow protein 
expression case is inaccurate. 

To model the data with ODEs and discrete states, several approaches could 
be taken. As is common in immunology [25] one could introduce sub-phenotypes, 
partitioning the cells into n > 2 domains by some predefined thresholding of their 
expressions, such that 

dXt v 
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-kiXi + z = 2, . ..,n-l, 



(ft 

^ = +k n _ 1 X n _ 1 . (9) 

One would then optimize the parameters fc, to produce the best fit to the data. This 
approach also has its limitations. The steady state distribution given by Eqs. (6) and (7) 
has a width, while Eq. (9) has a steady state where all cells are within the X n partition. 
Any overlap between the steady state distribution and the X n _i state will not be resolved 
by such a model. Additionally this method introduces spurious phenotypes having little 
to do with the underlying biology. Alternatively, one could make the transition rate k 
a function of time k(t). Like the previous case, this technique describes the data, but 
provides little insight into the biology of the system. 

3.2. Failure of single cell systems biology: Cell division 

The models presented here are constructed based on chemical number rather than 
concentration. This gives correspondence with fluorescence experiments and enables 
accurate estimation of stochastic effects. Upon cell division we must divide the contents 





100 




50 







CO 


100 










U 50 






O 




>> 







100 


'co 




S 




De 


50 









100 




50 








Slow k Fast (3 



Fast k Slow (3 



«i 
T 



A 


day 


A I dayO 

A 




day 15 


J day 15 




day 30 


\ day 30 




day 45 

A 


\ day 45 



a 2 a 1 

T T 
Expression Level 



"2 

T 



Figure 3. Expression profiles at multiple time points for slow state switching / fast 
protein production (left column) and fast switching / slow protein production (right 
column). The dashed line represents the half-way point between the steady state 
protein concentrations in the two states. ODE models typically count the number 
of cells above and below a threshold like this one to consider transition rates. The 
parameters of the two models are such that the steady state values and the overall 
protein relaxation times are the same. The shaded region corresponds to the number 
of low-expressing cells plotted in Fig. 4. In the simulation on the left: a\ = 94.5 
copies/day, a 2 = 190 copies/day, (3 = 1.0 day~\ k = 0.075 day -1 ; while on the right: 
cti = 5.0 copies/day, ot2 = 10.0 copies/day, /3 = 0.05 day -1 , k = 0.2 day -1 . 
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Figure 4. The total number of cells in the low-expression phenotype (as defined by 
the cutoff in Fig. 3) for the two scenarios. When the expression level equilibrates 
rapidly after the transcription factor is bound (dashed line), the system has a single 
characteristic decay time scale, and the number of cells in the low state can be modeled 
with a single ODE. When the protein expression dynamics are slow to respond (solid 
line), the decay of the population in the low state is non-exponential. 



of a cell in half (assuming symmetric cell division). This gave us the non-local PDE 
in Eq. (4). Such nonlocal partial differential equations are uncommon and most 
computational tools are ill-equipped to deal with them. The use of finite difference, finite 
element, and spectral methods in solving these types of equations has been studied in a 
series of papers [26, 27, 28]. For large dimensional systems, Monte-Carlo integration can 
provide a more efficient numerical solution. In these examples we use finite difference 
methods. 

Dilution of a dye: As a simple example of dilution by division, consider a dye such 
as CFSE or BrdU. These dyes are used to measure cell division rates in vivo and are 
frequently used in studying the cellular dynamics of immune responses. These dyes are 
not produced by the cells and are degraded slowly, yielding 7 = 0. This removes the 
advection term in Eq. (4) yielding: 



For a dye that initially has a narrow Gaussian distribution in cells, we have the 
output shown in Fig. 5. This system has been well described using ODE models [29, 30], 
with a single ODE for the number of cells in each peak. We note that, for brevity, we 
are using a model with exponentially distributed division times. For rapidly dividing 
cells, more detailed models of cell cycle provide greater accuracy [30, 13]. 



dp(A,t) 
dt 



Hp(A,t) +Afj,p(2A,t). 



(10) 



Dilution and homeostasis: For a chemical that is produced in the cell, division 
can bias the population-expression. Fig. 6 shows an example of this effect. Here we 
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Figure 5. Dilution of dye upon cell division. The rightmost peak is the initial 
undivided cells. Each peak to the left is cells that have gone through an additional 
division. Here /i = 0.07 day -1 . 



have simulated two populations of cells, one not-dividing (solid) and one undergoing 
homeostatic division (cell death and division rates are equal, dashed curve). These 
curves are stationary distributions generated by the equation: 

where we have also included the stochastic effects of the chemical dynamics. To keep 
the system from growing, we have cell death rate equal to division rate, giving an extra 
factor of 2 in the second to last term. 

Fig. 6 shows stable distributions for this system with and without fi = 0. As we can 
see, cell division biases the distribution, reducing the mean and increasing the width. 
In general, the more rapid the division, the more exaggerated the effects. If the division 
rate exceeds the chemical degradation rate (3, the stable distribution is very different 
from what is seen here, and is centered close to A = 0. 

Statistical deviations resulting from cell division have been studied in previous work 
[31, 32, 33]. This type of noise is typically considered extrinsic noise [31]. It is also often 
approximated as a local and continuous process and incorporated into chemical decay 
terms [34], a modeling choice which omits many of the effects illustrated in this section. 



Dilution and expansion: Fig. 7 shows a simulation of a two-dimensional system 
where the vertical axis represents a chemical A\ that is produced by the cell, as in Eq. (4) 
(Fig. 6), and the horizontal axis represents a dye concentration A 2 , with dynamics as in 
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Figure 6. Stable distributions from a non-dividing population and one that 
is undergoing homeostatic division. Homeostatic division biases the population- 
expression, reducing the mean and increasing the width. The effects of cell division on 
the population-expression are greater the more rapid the division rate is. Parameter 
values were a = 200 copies/day, /3=0.4 day -1 , fj, = 0.07 day -1 . 



Eq. (10) (Fig. 5). Here the population is expanding rather than undergoing homeostatic 
division. The equation describing the dynamics of the system is: 

dp{Al d ^ t] = -^[(a-^A l )p(A 1 ,A 2 ,t)]+ 1 -^[(a + pA 1 )p(A h A 2 ,t)} 

- Hp(A 1 ,A 2 ,t) + 8np(Ai,A 2 ,t), (12) 

The simulation considers a system where cells are initially in an equilibrium 
distribution for a non-dividing population (solid curve in Fig. 6 for vertical axis, and day 
density in Fig. 5 for horizontal axis). Beginning on day in this simulation, the cells 
are stimulated to divide. This simulation has correspondence with resting lymphocytes 
that are dyed with CFSE before the system is infected on day zero, initiating rapid 
lymphocyte division. Contours are drawn with logarithmic spacing. On day 7 shown in 
Fig. 7, Ai is diluted as the population divides. There has been no internal change in the 
chemical dynamics as is typically considered in down-regulation of a gene-product. The 
use of population-expression models can help to discriminate between a down regulation 
where production rate a is decreased and where simple dilution is occurring. 

Cell division and spurious correlations: Another effect of cell division is that two 
chemical quantities that have independent dynamics can have correlations generated 
by cell division. Cell division will cut both otherwise independent quantities in half 
simultaneously. The population-expression equation is: 

dp{Al d ^ t] = -^[{a-^A l )p{A^A 2 ,t)} + ~[{a + PA l )p{A 1 ,A 2 ,t)} 

2 2 

-2pp(A 1 ,A 2 ,t) + 8pp(A 1 ,A 2 ,t), (13) 

having a similar form to Eq. (12). However, A 2 also obeys a simple gene-product rate 
law, and we include an extra factor of 2 in the second to last term for homeostatic 
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Figure 7. Simulation of a population of cells on day 7 for a two dimensional system 
with chemical A\ obeying chemical rate law dA\jdt — a — (3A\, and Ai representing 
a dye. Contours are spaced logarithmically and we have included stochastic effects 
of A expression. The population initially had A\ distributed at equilibrium for a 
non-dividing population. At day the population began dividing which dilutes A\ 
and A2, though Ai is produced in the cell giving a vertical spread. This simulation 
has correspondence with a resting population of lymphocytes that is dyed and then 
stimulated by an infection on day resulting in rapid expansion. Thought there is no 
change in the production rate a dilution gives a reduction in expression. Population- 
expression models can discriminate between reduction in expression resulting from a 



change in chemical dynamics and this simple dilution. Here /i 
copies/day, (3 = 0.08 day" 1 . 
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division. 

A simulation of the equilibrium distribution of Eq. (13) is shown in Fig. 8. This 
is a two-dimensional extension of Fig. 6. The non-dividing population corresponding 
to the solid curve in Fig. 6 is depicted at top, and the correlated spread resulting from 
cell division shown at bottom. The asymmetry in the distribution is a result of the A 2 
dynamics being more rapid than the A\ dynamics (5 > (5). 

We note that correlated fluctuations in expression levels are frequently used to infer 
the structure of genetic regulatory [35, 36], signaling [ ], and metabolic networks [38]. 
Failing to account for the effects of cell division in such an analysis can lead to the 
incorrect reconstruction of the genetic network. Spurious correlations between gene- 
products are strongest for pairs where both have slow degradation rates. Correlations 
in gene-product expression result very naturally from cell division. These correlations 
are typically grouped with other forms of extrinsic noise [31]. Population-expression 
models allow us to resolve the relative magnitude of different noise sources in extrinsic 
noise, potentially improving genetic regulatory network reconstruction methods. 
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Figure 8. Cell division introduces correlations between otherwise independent gene 
products. Here A\ and A-i have simple rate laws (dA\/dt = a — /3A\ and dA^fdt = 
S — eAi with Ai dynamics faster than A\). In a non-dividing population, we see that 
these products are not correlated (top). In dividing cells, both A\ and A-i are halved 
at the same time (cell division) introducing correlations in expression level (bottom). 
The effects of cell division should be accounted for when analyzing expression data for 
correlations to avoid spurious conclusions. Numerical values used in this simulation 
were a = 200 copies/day, (3=0A day -1 , S = 800 copies/day, e=1.6 day -1 , and /i = 
0.07 day -1 for the dividing population. 



3.3. Failure of single cell systems biology: selection bias 

Consider now a two gene example with influx and selection. Here there is an initial 
population of cells localized around (A®, A®). These cells have chemical dynamics such 
that at t — 0, A\ begins to rapidly decrease and A2 begins to gradually increase. The 
population dynamics that underlies selection in this system arises from changes in the 
rate of division and death of cells in a manner dependent on the concentrations of A\ 
and A 2 within the cell. We set the division rate proportional to Ax and the death rate 
proportional to A<i- The system also has a gradual influx of cells Yi^Ax^AA entering 
the system around (A®, A®). Here we do not consider the effects of dilution with cell 



Population- expression models of immune response 



14 



division. 

The system is described, using the vector notation, by: 

71 = a-0A 1 , (14) 

72 =5-eA 2 , (15) 

Y t = ~ V ■ [lP\ + V(DVp) + dA lP - dA 2 p + T(A U A 2 ), (16) 

where we have used terms introduced in Section 2 and omitted the dependence of terms 
on the quantities A\ and A 2 for brevity. 

Fig. 9 shows the evolution of the density in the A X ,A 2 plane. We see that, by 
day 10, the initial population has proliferated and progressed along the differentiation 
pathway. By day 50 we see there has been considerable proliferation and the cells that 
are furthest along in the differentiation pathway have begun to decay. We also see 
the effects of the gradual influx of new cells at day 50 where the population now has 
a tail of recent immigrants that have proliferated. By day 100 the initial population 
has decayed completely and there is a stable distribution. This stable distribution is 
maintained by the influx of new cells and not by a lack of cell turnover. There is no 
change in "phenotypic state" for the cells in this simulation, meaning cells maintain the 
same production and degradation values for A\ and A 2 throughout the simulation. 

The average differentiation path of a cell is given by the solid black curve in Fig. 9. 
Though the stable distribution is localized, it does not correspond to the phenotypic 
state described by the A\, A 2 dynamics. The population dynamics gives a strong 
bias to the distribution that is not predicted by the chemical dynamics alone. In 
statistical physics it is common to use the "fluctuation-dissipation theorem" to estimate 
model parameters from the equilibrium distribution. Any such analysis of a dynamic 
population must also take selective effects into account [39]. 

Rather than discuss the dynamics of A% and A 2 separately we can discuss the 
differentiation of cells moving along the one dimensional average path (black curve). 
To do this we introduce the variable a where cells enter the system at a = and the 
differentiation pathway takes them towards a — 1; though, as seen in Fig. 9 Day 100, 
they may never reach a = 1. In this reduced model we also neglect the stochastic effects 
and the only heterogeneity in the system is due to the influx of new cells. 

The one dimensional description is given by: 

7 =e-ea, (17) 

Ma) = ^ - (~ (1 - a)"<, ( 18 ) 

A 2 (a) = A° 2 (l - a) + -a, (19) 



dt = ~da^~ r?a)p(a ' t)] + dAl ^P ~ dA ^)p + r(a = 0). (20) 
In this one-dimensional model we still have a distribution of cells since influx of a = 
cells gives diversity to the system. In the absence of this influx we can describe the 
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Figure 9. A two dimensional system with selection. Here cells proliferate with rate 
proportional to A\ and die with rate proportional to A 2 . An initial population localized 
around (A®, j4°) in the lower right (Day 0) and proliferates as they begin to differentiate 
(Day 10). There is a constant but gradual influx of new cells entering the system around 
(AjjAij) that can be seen biasing the population by day 50, giving the population a 
tail of higher A\ expression. The A 2 dynamics are slower and cells die before they 
ever reach the steady state predicted by the chemical dynamics (dashed black circle at 
Day 100) . Instead a steady state that is a product of the population dynamics and the 
differentiation is reached which requires constant influx to maintain. The black curve 
illustrates the mean trajectory of the cells as predicted by the chemical dynamics. In 
this simulation a = 100 copies/day (3=1 day -1 , 5 = 13.33 copies/day, e=0.07 day -1 , 
d = 0.04 copies _1 day _1 . 



population with an ODE model where the population has an internal variable (a zero- 
dimensional approximation) : 
da 

- = e-ea, (21) 

= tL4i(a)X(t) - dA 2 {a)X{t). (22) 

where Ax(a) and A 2 (a) are described by Eq. (18) and (19). This approach was recently 
used to describe the exhaustion of CD8 T cells during a chronic infection where the 
internal variable corresponded to the level of exhaustion in the population and where 
thymic influx could be neglected [40]. 



4. Choosing the right variables 

Traditional flow cytometry interrogates large numbers of cells. However the information 
from a single cell is limited by the spectral overlap of the fluorescent dyes to measuring 
the concentration of about fifteen different molecules. Soon, new techniques such as 
Cy-TOF [41] (which merges mass-spectrometry with flow cytometry) will allow us to 
overcome this limitation and obtain simultaneous measurements of the concentration 
of hundreds of molecules at the single cell level. As the dimensionality increases, 
the techniques of population-expression modeling become computationally intractable. 
This necessitates dimensional reduction and identification of "key players" among the 
measured molecular expressions. At the same time, even as we measure more and more 
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quantities, some of the key players will still be omitted, forcing us to look for such 
important missing links. 

In the simplest case, the expression dynamics for all chemical species in the system 
would be determined by a few key regulators, B^, i.e., dAi/dt = (Xi(B, Ai)—/3i(B, Ai)+r], 
where a and j3 are the production/degradation functions, and 77 is the noise term. B^ can 
be an individual chemical species, or more likely some function of many of the individual 
expressions. The goal is to find the minimal set B from data, or to understand if the 
data does not provide sufficient information to do so. 

There is no single universal approach for dealing with large-dimensional data 
that would solve both of these problems in the immunological context. In fact the 
problems are not unique to immunology, or even to biology. Classic dimensionality 
reduction techniques include Principal Components Analysis (PCA) [42], Independent 
Components Analysis (ICA) [43], LASSO regression [44], and other approaches that 
explicitly identify (locally) linear subspaces spanned by data [45, 46, 47]. Many of 
these would be problematic in immunology since they measure importance by explained 
variance, which changes depending on the measurement units used. For example in 
PCA, using the measured brightness or its logarithm as the raw data may give very 
different results. The problem is solved elegantly with information-theoretic approaches, 
which are manifestly reparameterization invariant [48]. 

For this and related reasons, some of the most successful dimensionality reduction 
approaches in quantitative cell biology (and in computational neuroscience) have relied 
on information-theoretic techniques. For example, finding pairs of genes with high 
mutual information among their microarray mRNA expression profiles that cannot 
be explained away by confounding effects of other regulatory interactions uncovers 
"minimal" transcriptional regulatory networks in cells as complex as human lymphocytes 
[35]. Higher order information-theoretic analyses [49] further disambiguate scenarios 
where simple pairwise interactions do not explain the data and more complex regulatory 
patterns are needed instead, (e.g. two or more factors regulating expression [50]). 
Similarly, searching for projections of the combinatorially complex stimulus space that 
preserve the information about the rate of spiking is one of the most powerful methods 
for finding receptive fields of neurons from electrophysiology data [ ]. All of these 
approaches are special cases of the rate-distortion framework [48], where a "small" 
description of data is sought that nonetheless preserves the information about the 
variable of relevance [48, 52]. The balance between the amount of information kept 
and the model size is controlled by the needs of the modeler and the data availability. 

These methods should work for the context of immunology, but some changes are 
needed. First, typical immunology flow cytometry experiments make it hard to assay 
many different phenotypic or temporal conditions, as is typically used for information- 
theoretic analyses [35, 51]. This limits the range of variation of the data and can 
artificially reduce the values of the measured information quantities. Luckily, as 
demonstrated in [53], having many (tens of) thousands of single cell measurements 
allows accurate information estimation in these scenarios. However, it is crucial for the 
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measurements to be of a very high accuracy. 

The second distinction of immunological data is that, in the foreseeable future, 
the number of profiled quantities will be in the hundreds, but not in the thousands, 
with cell surface molecules being the easiest to profile. This leaves a possibility for 
missing key regulators in the data sets. As was demonstrated recently [53], information 
theoretic analysis can detect when such important regulators are missing. This is done 
by observing that a missing regulator induces complex statistical dependences among 
all of its targets that cannot be explained by simple pairwise correlations [ >]. While 
identification of such missing regulators in a semi- automated fashion is possible [50], the 
smaller dimensionality of the immunological data requires resetting the balance between 
the precision and the recall. 

The third, and the most fundamental, distinction of immunological data is their 
population-expression nature. As illustrated in Fig. 8, cell division and death introduces 
spurious statistical relations among the measured expressions. Distinguishing effects 
of regulation vs. population on the interactions among the measured variables should 
be possible by measuring the statistics of relations among physically non-interacting 
variables in experimental data and in numerical simulations. 



Day Day 1 Day 10 




Figure 10. A two dimensional system of coupled stochastic biochemical species, with 
deterministic dynamics as in Eqs. (23) and (24), with ao = 200 copies/day, a± = 
2000 copies/day /3=22 day -1 , K = 30 copies, n — 6. A\ exhibits bistability. Since 
it controls the expression of A%, the distribution of the latter is also bimodal. Notice 
the asymmetry of the contour plots of the joint probability distribution. By itself, 
such asymmetry, as in the central panel, simply signals unequal regulation of the two 
species. However, time series measurements will notice that the population average of 
dAijdt is correlated with the population average of A±, but not the other way around. 
Graphically, this corresponds to the population escaping from the low expression steady 
state along the A\ direction first, with A2 following. This is a signal of the potential 
causal regulation Ai —> A%. 



Since development and differentiation of immune cells is fast and can be tracked in 
flow cytometry experiments on the scale of days, the data offers an ability to establish 
causality of regulation [5 1]. This is in contrast to identification of non-causal, symmetric 
relations among variables in most systems biology or computational neuroscience data 
analysis approaches. We illustrate this on the example of two coupled biochemical 
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species obeying the deterministic dynamics 

dAi a x A^ „ A , . 

— - = a + = \ -f3A 1 , 23 
dt K n + A™ v ' 

dA 2 ctiAl „ A , . 

dt K n + A? H 1 K J 

Here A\ is self-regulating and can have two stable expression levels. A 2 is regulated by 
Ai and will also be bimodal, but there's a clear difference between the two variables. 
Solution of the corresponding Fokker-Planck system is shown in Fig. 10, illustrating that 
the dynamics of the transient shapes of the joint probability distribution can signal the 
causality of regulatory relations. 

This could be confirmed experimentally by sorting the cell population at an 
early time (e.g. day 1) into subpopulations based on expression levels. The contrast 
between the dynamics of the Ai-high, A 2 -low subpopulation and the Ai-low, A 2 - 
high subpopulation would reveal which is the driver of the system. These sorted 
subpopulations would be placed into animals where they are recognized by unrelated 
genetic markers (e.g. Thyl.l) and monitored to see which subpopulation reaches Ax- 
high, v4 2 -high more rapidly. 



5. Conclusion 



Modeling in systems immunology is still in its infancy. Modeling requires identifying the 
key players and parameters that describe the behavior of interest. Population-expression 
models provide a tool for interpreting the changing expression profiles of multi-cellular 
populations that are differentiating while dividing and undergoing selection. They 
achieve this by connecting the population scale with intracellular systems biology. 

The interpretation of immunological data has typically consisted of enumerating 
cellular phenotypes and describing how the sizes of these populations change over time. 
In contrast, the interpretation of data with population expression models focuses on the 
chemical interaction network common to all these phenotypes, and on the dependence 
of expression levels on division and death rates. One could instead continue adding 
additional phenotypic states to more accurately describe the data, but this is reminiscent 
of the "epicycles on epicycles" used to described the motion of the planets in the 
Ptolemaic geocentric model of the universe. Looking at the problem differently can 
yield both simplicity and insight. 

A complete view of systems biology would capture population dynamics, within- 
cell systems biology, and spatial effects. The spatial effects like clustering can occur at 
different scales. At the within-cell scale for example, clustering of molecules in the cell 
membrane plays an important role in the detection of antigen (infected cells) by T cells. 
At the population level, pathogens can be localized to the specific tissues and organs 
which they infect, while B and T cell responses occur in other sites such as the lymph 
nodes. Some spatial effects can be easily incorporated into the population-expression 
framework. The population-expression models are well suited to compartmentalization, 
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where one considers a population-expression equation for different tissues and expression 
dependent trafficking rates between these compartments. For finer scale spatial effects, 
the population-expression approach breaks down, as the PDEs assume large numbers 
of cells in the compartments. In these low density regimes one must instead consider a 
model which treats cells discretely. In molecular systems biology, master equations and 
discrete stochastic simulations using Gillespie and related algorithms are very commonly 
used to describe the discreteness of stochastic changes in the phenotype of individual 
cells [55] alongside continuous Fokker-Planck and Langevin equation approaches. For 
methodological purposes, we built the current work around the population-expression 
analogue of the Fokker-Planck equation. However, it is clearly possible to develop 
the corresponding master equations and stochastic simulation algorithms, where the 
number of cells in a certain chemical state would be tracked. Nonlocal transitions due 
to cell division and related phenomena are not conceptually difficult to implement in 
such approaches, but the number of types of possible transitions, and hence the time 
complexity of a simulation, might grow excessively because of the nonlocality. We leave 
the development of these simulation algorithms for future publications. 

Advances in a field often require the integration of theoretical and experimental 
approaches. In the past the use of cellular dynamics data, such as flow cytometric data, 
typically allowed us to enumerate large numbers (millions) of cells but restricted us 
to making a handful of measurements on each cell, limiting the phenotypic resolution. 
The extension of traditional flow cytometry to Cy-TOF [11] allows the measurement of 
hundreds of biochemical species simultaneously at the single cell level. This allows, for 
the first time, tracking cellular systems biology dynamics and the population dynamics 
simultaneously and with high accuracy. The aim is to understand interactions among 
internal states of single cells and the composition of cellular populations, and hence 
the responses of the populations to infections. In this article we touched upon key 
problems that need to be addressed for such analysis: simultaneous representation of 
molecular system and population dynamics, including proliferation and cell death, and 
identification of key components of regulatory networks. We outlined a few ways in 
which these problems can be tackled computationally, by modifying current analysis 
approaches and by introducing population-expression modeling. 
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